Data and Approach

We used IRS tax data for the state of Tennessee to review income levels by zip code, along with other financial variables, to find correlation with county level school data. We analyzed school and tax data separately, as well as their influence on each other. The data sources for our research have been included in our Sources Cited.

Cleaning the Data

Some school districts reported a graduation rate of 0%. Obviously, this is not accurate, and were likely empty values. We removed this data as to not influence the averages per CORE Region. We also identified instances where county data did not map exactly between tax and shool data. We were able to leverage the zip code crosswalk to join these datasets, as well as some data cleansing to ensure DeKalb County was properly recognized.

Research

School Funding, Taxes, and Economic Growth, An Analysis of the 50 States, was part of our research.
— add Smita’s research paper — increase funding by 2% increased performance

Dual Enrollment for TN School Districts

SMITA —- Enrollment, Graduation, Ethnicity,

#smita_df <- readRDS("smita.RDS")

Our Observations

chloropleth maps

insert avg AGI

insert avg ACT Composite

insert ACT by region (boxplot)

Graduation Rates by Region

plot_ly(combined_df, y = ~ACT_Composite, x = ~ratio_by_agi, color = ~agi_range,
                   type = 'scatter',
                   mode = 'markers',
                   alpha = 0.3,
                   text = ~paste('Zip Code: ', zip_code,
                                 '<br> System Name: ', system_name))
## Warning: Ignoring 813 observations
#                   layout(xaxis = x_grad, yaxis = y_grad, legend = list(orientation = 'h')) %>%
#                   add_trace(agi_grad_plot, y = ~mean_grad, color = ~CORE_region, type = "box")

Southwest Memphis Core seems to have more variation in ACT Composite scores than any other region in the state.

Including Plots

Grades Matter!

Using data from the state achievement scores, we were able to accurately predict the average ACT scores for a county given the proficiency rates of four key subject categories: Algebra I, Chemistry, Math, and ELA. Using DeKalb County as a test county, we trained a prediction model on the other 94 counties in Tennessee and we able to predict DeKalb’s ACT Composite score of 19.1.
lm(formula = ACT_Composite ~ AlgI + Chemistry + Math + ELA, data = school_cross_no_dekalb_no_outliers)

View(school_cross) ## Sources Cited

Packages Used

  • readxl
  • ggplot2
  • knitr
  • dplyr
  • tidyr
  • PerformanceAnalytics
  • maps
  • mapdata
  • mapproj
  • ggmap